Take-Home Excercise 03

Author

Mayuri Salunke

Published

March 19, 2023

Modified

March 20, 2023

1.0 Overview

1.1 Background

Housing and Development Board (HDB) flats are a crucial aspect of the Singaporean housing market. They serve as the primary form of public housing for over 80% of the population, providing affordable and accessible homes for citizens. Due to the high demand for public housing, the pricing of HDB flats is a crucial issue that affects not just the housing market but also the wider economy and society.

The pricing of HDB flats in Singapore is determined by a range of factors, including location, age, size, and condition of the flat, as well as market demand and supply. The government plays a crucial role in setting the pricing policies for HDB flats, which have significant implications for homeowners, potential buyers, and the broader economy.

Investigating and explaining the factors that affect the resale prices of public housing in Singapore is essential for several reasons. Firstly, understanding these factors can provide valuable insights into the housing market’s dynamics, helping policymakers and stakeholders make informed decisions. Secondly, resale prices can significantly impact homeowners’ financial well-being, so it is crucial to understand the factors that contribute to these prices. Thirdly, the study of these factors can help homeowners make informed decisions about when to sell their homes and at what price. Finally, understanding the factors that affect resale prices can help identify potential areas for intervention or policy changes to ensure the stability and affordability of public housing in Singapore. Overall, investigating and explaining the factors affecting resale prices of public housing in Singapore is an important area of research with significant implications for homeowners, policymakers, and the wider society.

1.2 Task

In this take-home exercise, we are tasked to predict HDB resale prices at the sub-market level (i.e. HDB 3-room, HDB 4-room and HDB 5-room) for the month of January and February 2023 in Singapore. The predictive models must be built by using by using conventional OLS method and GWR methods. You are also required to compare the performance of the conventional OLS method versus the geographical weighted methods.

1.3 Packages Used

  • sf :

  • tidyverse :

  • tmap :

  • spdep :

  • onemapsapi :

  • httr :

  • ggmap :

  • rvest : for html_text()

  • units : ???

  • matrixStats : ???

  • jsonlite : ??

  • olsrr : ??

  • coorplot : ??

  • GWmodel :

  • devtools :

  • kableExtra :

  • plotly :

  • ggthemes :

# initialise a list of required packages
packages = c('sf', 'tidyverse', 'tmap', 'spdep', 'httr', 'ggmap', "rvest",
             'onemapsgapi', 'units', 'matrixStats', 'readxl', 'jsonlite',
             'olsrr', 'corrplot', 'ggpubr', 'GWmodel',
             'devtools', 'kableExtra', 'plotly', 'ggthemes')

# for each package, check if installed and if not, install it
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}
# reference for manipulating output messages: https://yihui.org/knitr/demo/output/
devtools::install_github("gadenbuie/xaringanExtra")
library(xaringanExtra)
xaringanExtra::use_panelset()

1.4 Datasets Used

Datasets Used
Type Name Format Source
Aspatial Resale Flat Prices .csv [data.gov.sg](https://data.gov.sg/dataset/resale-flat-prices)
Geospatial MPSZ-2019 .shp From Prof. Kam's In-Class Excercise 09
Geospatial Eldercare Services .shp [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial Hawker Centres .geojson [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial Gyms .geojson [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial General Information of Schools .csv [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial Parks .kml [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial MRT Locations .geojson [data.gov.sg](https://data.gov.sg/dataset/)
Geospatial Kindergartens .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Pre-School Locations .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Private Education Institutes .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Supermarkets .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Childcare Services .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Dengue Clusters .shp [OneMap API](https://www.onemap.gov.sg/docs/)
Geospatial Bus Stop Locations .shp [LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=bus%20stop)
Geospatial Shopping Malls .csv Wikipedia

We are considering the following factors to determine the resale price of HDB

  • Floor Level

  • Remaining Lease Period

  • Area of the Unit

  • Age of Unit

  • Storey-Floor

  • Proximity to CBD

  • Proximity to eldercare

  • Proximity to foodcourt/hawker centers

  • Proximity to MRT

  • Proximity to park

  • Proximity to good primary school

  • Proximity to shopping mall

  • Proximity to supermarket

  • Number of Kindgartens within 350m

  • Number of Childcare services within 350m

  • Number of Bus stops within 350m

  • Number of Primary Schools within 1km

2.0 Importing and Wrangling of Aspatial Data

We use the read_csv() function of readr package to import resale-flat-prices into R as a tibble data frame called resale. Further, glimpse() function of dplyr package is used to display the data structure.

resale <- read_csv("data/aspatial/resale-flat-prices.csv")
glimpse(resale)
Rows: 148,864
Columns: 11
$ month               <chr> "2017-01", "2017-01", "2017-01", "2017-01", "2017-…
$ town                <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type           <chr> "2 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", …
$ block               <chr> "406", "108", "602", "465", "601", "150", "447", "…
$ street_name         <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 4", "ANG MO K…
$ storey_range        <chr> "10 TO 12", "01 TO 03", "01 TO 03", "04 TO 06", "0…
$ floor_area_sqm      <dbl> 44, 67, 67, 68, 67, 68, 68, 67, 68, 67, 68, 67, 67…
$ flat_model          <chr> "Improved", "New Generation", "New Generation", "N…
$ lease_commence_date <dbl> 1979, 1978, 1980, 1980, 1980, 1981, 1979, 1976, 19…
$ remaining_lease     <chr> "61 years 04 months", "60 years 07 months", "62 ye…
$ resale_price        <dbl> 232000, 250000, 262000, 265000, 265000, 275000, 28…

We can see the following information upon running the glimpse function -

  • The dataset contains 11 columns with 148,864 rows

  • The columns present are the following -

    • month (month here is in the format of yyyy/mm)

    • town

    • flat_type

    • block

    • street_name

    • storey_range

    • floor_area_sqm

    • flat_model

    • lease_commence_date

    • remaining_lease

    • resale_price

  • The data is from Jan 2017 and consists of all flat types including Executive to 2,3,4 and 5 bedrooms. However, we will only be focusing only 4 bedroom and we need data from 1st January 2021 on wards till 31st December. Hence, we will be filtering the data

2.1 Filtering Resale Data

We will be using the filter() function of dplyr to select our flat_types and dates in rs_subset. We will also be using the unique() function to check if we have successfully extracted the flat_type and month.

rs_subset <-  filter(resale,flat_type == "4 ROOM") %>% 
              filter(month >= "2021-01" & month <= "2022-12")
glimpse(rs_subset)
Rows: 23,656
Columns: 11
$ month               <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021-…
$ town                <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type           <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", …
$ block               <chr> "547", "414", "509", "467", "571", "134", "204", "…
$ street_name         <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "ANG MO …
$ storey_range        <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09", "0…
$ floor_area_sqm      <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 109, 9…
$ flat_model          <chr> "New Generation", "New Generation", "New Generatio…
$ lease_commence_date <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 1978, 19…
$ remaining_lease     <chr> "59 years", "57 years 09 months", "58 years 06 mon…
$ resale_price        <dbl> 370000, 375000, 380000, 385000, 410000, 410000, 41…
unique(rs_subset$month)
 [1] "2021-01" "2021-02" "2021-03" "2021-04" "2021-05" "2021-06" "2021-07"
 [8] "2021-08" "2021-09" "2021-10" "2021-11" "2021-12" "2022-01" "2022-02"
[15] "2022-03" "2022-04" "2022-05" "2022-06" "2022-07" "2022-08" "2022-09"
[22] "2022-10" "2022-11" "2022-12"
unique(rs_subset$flat_type)
[1] "4 ROOM"

From the above results we can see that there are 23,656 transactions for 4 Bedroom flats from 1st January 2021 to 31st December 2022.

2.2 Transforming Resale Data Columns

We will be using the mutate() function to create a new variable called rs_transform with the following columns -

  • address : concatenation of the block and street_name columns using paste() function of base R package

  • remaining_lease_yr & remaining_lease_mth: we will split the year and months part of the remaining_lease respectively using str_sub() function of stringr package then converting the character to integer using as.integer() function of base R package

rs_transform <- rs_subset %>%
  mutate(rs_subset, address = paste(block,street_name)) %>%
  mutate(rs_subset, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2))) %>%
  mutate(rs_subset, remaining_lease_mth = as.integer(str_sub(remaining_lease, 9, 11)))
head(rs_transform)
# A tibble: 6 × 14
  month   town     flat_…¹ block stree…² store…³ floor…⁴ flat_…⁵ lease…⁶ remai…⁷
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>     <dbl> <chr>     <dbl> <chr>  
1 2021-01 ANG MO … 4 ROOM  547   ANG MO… 04 TO …      92 New Ge…    1981 59 yea…
2 2021-01 ANG MO … 4 ROOM  414   ANG MO… 01 TO …      92 New Ge…    1979 57 yea…
3 2021-01 ANG MO … 4 ROOM  509   ANG MO… 01 TO …      91 New Ge…    1980 58 yea…
4 2021-01 ANG MO … 4 ROOM  467   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
5 2021-01 ANG MO … 4 ROOM  571   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
6 2021-01 ANG MO … 4 ROOM  134   ANG MO… 07 TO …      98 New Ge…    1978 56 yea…
# … with 4 more variables: resale_price <dbl>, address <chr>,
#   remaining_lease_yr <int>, remaining_lease_mth <int>, and abbreviated
#   variable names ¹​flat_type, ²​street_name, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date, ⁷​remaining_lease

2.3 Sum up remaining lease in months

There are some values in remaining_lease_mth which are NA. We will be first converting this NA values into 0 with the help of is.na() function. Upon doing this, we will convert the remaining_lease_yr into months by multiplying it by 12. We will replace these 2 columns by summing both these columns to create the remaining_lease_mths using rowSums() and mutate() functions. This columns will show the total remaining lease period in months.

rs_transform$remaining_lease_mth[is.na(rs_transform$remaining_lease_mth)] <- 0
rs_transform$remaining_lease_yr <- rs_transform$remaining_lease_yr * 12
rs_transform <- rs_transform %>% 
  mutate(rs_transform, remaining_lease_mths = rowSums(rs_transform[, c("remaining_lease_yr", "remaining_lease_mth")])) %>%
  select(month, town, address, block, street_name, flat_type, storey_range, floor_area_sqm, flat_model, 
         lease_commence_date, remaining_lease_mths, resale_price)

2.4 Retrieval of Address

We will be retrieving data such as postal codes and coordinates of the address which will be essential in finding the proximity to locational factors.

2.4.1 Creating a list storing unique addresses

We will be using the unique() function of the base R package to extract unique addresses and then using the sort() function of base R package to sort the unique vectors. Further, we will be storing it in a list, so that we do not call the GET request more than required.

add_list <- sort(unique(rs_transform$address))

2.4.2 Creating a function to retrieve coordinates from OneMap.sg API

We will be using the GET() function of httr package to make a GET request to https://developers.onemap.sg/commonapi/search. This allows us to query spatial data in a tidy format. The retrieved coordinates will be then be stored in a dataframe create called postal_coords. Further, we need to take note of the following variaboles which will be used in our GET() request.

  • searchVal: Keywords entered by user that is used to filter out the results.

  • returnGeom {Y/N}: Checks if user wants to return the geometry.

  • getAddrDetails {Y/N}: Checks if user wants to return address details for a point.

A thing to note is that the returned JSON response will contain multiple values, however, we are only interested in the postal code and coordinates like Longitude and Latitude. Further, we will create a dataframe new_row to store each final set of coordinates retrieved during the loop. We are creating this new dataframe so that we can check the number of responses returned and append to the postal_coords (using r_bind()) as some of the locations might have a single result of postal while the others might have multiple set of postal codes. Further, we also need to check if the address is invalid by looking at the number of rows returned (i.e. found = 0).

get_coords <- function(add_list){
  
  # Create a data frame to store all retrieved coordinates
  postal_coords <- data.frame()
    
  for (i in add_list){
    #print(i)

    r <- GET('https://developers.onemap.sg/commonapi/search?',
           query=list(searchVal=i,
                     returnGeom='Y',
                     getAddrDetails='Y'))
    data <- fromJSON(rawToChar(r$content))
    found <- data$found
    res <- data$results
    
    # Create a new data frame for each address
    new_row <- data.frame()
    
    # If single result, append 
    if (found == 1){
      postal <- res$POSTAL 
      lat <- res$LATITUDE
      lng <- res$LONGITUDE
      new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
    }
    
    # If multiple results, drop NIL and append top 1
    else if (found > 1){
      # Remove those with NIL as postal
      res_sub <- res[res$POSTAL != "NIL", ]
      
      # Set as NA first if no Postal
      if (nrow(res_sub) == 0) {
          new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
      }
      
      else{
        top1 <- head(res_sub, n = 1)
        postal <- top1$POSTAL 
        lat <- top1$LATITUDE
        lng <- top1$LONGITUDE
        new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
      }
    }

    else {
      new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
    }
    
    # Add the row
    postal_coords <- rbind(postal_coords, new_row)
  }
  return(postal_coords)
}

2.4.3 Retrieve Resale Coordinates

coords <- get_coords(add_list)

2.4.3 Check Results

We will be using the is.na() function of base R package to chekc if any of the relevant columns contain any NA values.

coords[(is.na(coords$postal) | is.na(coords$latitude) | is.na(coords$longitude) | coords$postal=="NIL"), ]
                    address postal         latitude        longitude
1305 215 CHOA CHU KANG CTRL    NIL 1.38308302434129 103.747076627693

From the above message, we can see that the postal code of 215 CHOA CHU KANG CTRL is missing. Upon searching it up online, we can see that the postal code is 680215.

2.4.4 Combine Resale and Coordinates Data

We will now use the left_join() function of dplyr package combine the successfully retrieved coordinates with out transformed resale dataset.

rs_coords <- left_join(rs_transform, coords, by = c('address' = 'address'))
head(rs_coords)
# A tibble: 6 × 15
  month   town     address block stree…¹ flat_…² store…³ floor…⁴ flat_…⁵ lease…⁶
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>   <chr>     <dbl> <chr>     <dbl>
1 2021-01 ANG MO … 547 AN… 547   ANG MO… 4 ROOM  04 TO …      92 New Ge…    1981
2 2021-01 ANG MO … 414 AN… 414   ANG MO… 4 ROOM  01 TO …      92 New Ge…    1979
3 2021-01 ANG MO … 509 AN… 509   ANG MO… 4 ROOM  01 TO …      91 New Ge…    1980
4 2021-01 ANG MO … 467 AN… 467   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
5 2021-01 ANG MO … 571 AN… 571   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
6 2021-01 ANG MO … 134 AN… 134   ANG MO… 4 ROOM  07 TO …      98 New Ge…    1978
# … with 5 more variables: remaining_lease_mths <dbl>, resale_price <dbl>,
#   postal <chr>, latitude <chr>, longitude <chr>, and abbreviated variable
#   names ¹​street_name, ²​flat_type, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date

2.4.5 Handling NIL data

We now need to add the postal code of 215 CHOA CHU KANG CTRL. Since we can see from the below code, the postal code is not in integer, but in character, we will be replacing the NIL with the postal code in character type.

typeof(rs_coords$postal)
[1] "character"
rs_coords[rs_coords$address == '215 CHOA CHU KANG CTRL', "postal"] <- "680215"

Lets verify that the postal code has been replaced and there are no more NA or NIL values.

rs_coords[(is.na(rs_coords$postal) | is.na(rs_coords$latitude) | is.na(rs_coords$longitude) | rs_coords$postal=="NIL"), ]
# A tibble: 0 × 15
# … with 15 variables: month <chr>, town <chr>, address <chr>, block <chr>,
#   street_name <chr>, flat_type <chr>, storey_range <chr>,
#   floor_area_sqm <dbl>, flat_model <chr>, lease_commence_date <dbl>,
#   remaining_lease_mths <dbl>, resale_price <dbl>, postal <chr>,
#   latitude <chr>, longitude <chr>

2.4.6 Write file to RDS

Since, our subset resale dataset is now complete with the coordinates, we can save it into a rds file to prevent running the GET() function multiple times.

rs_coords_rds <- write_rds(rs_coords, "data/rds/rs_coords.rds")

Now lets read the RDS file to verify its saved properly.

rs_coords <- read_rds("data/rds/rs_coords.rds")
glimpse(rs_coords)
Rows: 23,656
Columns: 15
$ month                <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021…
$ town                 <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO…
$ address              <chr> "547 ANG MO KIO AVE 10", "414 ANG MO KIO AVE 10",…
$ block                <chr> "547", "414", "509", "467", "571", "134", "204", …
$ street_name          <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "ANG MO…
$ flat_type            <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM",…
$ storey_range         <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09", "…
$ floor_area_sqm       <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 109, …
$ flat_model           <chr> "New Generation", "New Generation", "New Generati…
$ lease_commence_date  <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 1978, 1…
$ remaining_lease_mths <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 692, 692,…
$ resale_price         <dbl> 370000, 375000, 380000, 385000, 410000, 410000, 4…
$ postal               <chr> "560547", "560414", "560509", "560467", "560571",…
$ latitude             <chr> "1.37420951743562", "1.36390466431674", "1.374000…
$ longitude            <chr> "103.858209667888", "103.853913839503", "103.8501…

2.4.7 Assign and Transform CRS

Since we are using Longitudes and Latitudes which are in decimals, the CRS will be WGS84. Hence, we will need to assign them first to EPSG code 4326 and then transform it to 3414 which is the EPSG code for SVY21 (Singapore).

rs_coords_sf <- st_as_sf(rs_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)

Now lets check that the CRS has been successfully transformed

st_crs(rs_coords_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

2.4.8 Checking for Invalid Geometries

length(which(st_is_valid(rs_coords_sf) == FALSE))
[1] 0

2.4.9 Plotting HDB Resale Points

tmap_mode("view")
tm_shape(rs_coords_sf)+
  tm_dots(col="blue", size = 0.02)
tmap_mode("plot")

3.0 Importing Geospatial Locational Factors

3.1 Locational Factors with Geographic Coordinates

We will begin with reading the simple features of the files and then retrieving coordinate reference system.

bus_sf <- st_read("data/geospatial/BusStop.shp")
Reading layer `BusStop' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\BusStop.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
childcare_sf <- st_read("data/geospatial/CHILDCARE.shp")
Reading layer `CHILDCARE' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\CHILDCARE.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1545 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
Projected CRS: WGS_1984_Transverse_Mercator
dengue_sf <- st_read("data/geospatial/DENGUE_CLUSTER.shp")
Reading layer `DENGUE_CLUSTER' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\DENGUE_CLUSTER.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 15 features and 9 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 13806.9 ymin: 32420.76 xmax: 40196.43 ymax: 43193.38
Projected CRS: SVY21
elder_sf <- st_read("data/geospatial/ELDERCARE.shp")
Reading layer `ELDERCARE' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\ELDERCARE.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
gym_sf <- st_read("data/geospatial/gyms-sg.geojson")
Reading layer `gyms-sg' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\gyms-sg.geojson' 
  using driver `GeoJSON'
Simple feature collection with 159 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6938 ymin: 1.262063 xmax: 103.9518 ymax: 1.435078
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
hawker_sf <- st_read("data/geospatial/hawker-centres-geojson.geojson")
Reading layer `hawker-centres-geojson' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\hawker-centres-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
kindergartens_sf <- st_read("data/geospatial/KINDERGARTENS.shp") 
Reading layer `KINDERGARTENS' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\KINDERGARTENS.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 448 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11909.7 ymin: 25596.33 xmax: 43395.47 ymax: 48562.06
Projected CRS: SVY21
mrt_sf <- st_read("data/geospatial/lta-mrt-station-exit-geojson.geojson")
Reading layer `lta-mrt-station-exit-geojson' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\lta-mrt-station-exit-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 474 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6368 ymin: 1.264972 xmax: 103.9893 ymax: 1.449157
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
privateInst_sf <- st_read("data/geospatial/CPE_PEI_PREMISES.shp")
Reading layer `CPE_PEI_PREMISES' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\CPE_PEI_PREMISES.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 284 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10805.79 ymin: 28394.92 xmax: 40287.11 ymax: 47937.08
Projected CRS: SVY21
parks_sf <- st_read("data/geospatial/parks.kml")
Reading layer `NATIONALPARKS_New' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\parks.kml' 
  using driver `KML'
Simple feature collection with 421 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6929 ymin: 1.214491 xmax: 104.0538 ymax: 1.462094
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
preschools_sf <- st_read("data/geospatial/PRESCHOOLS_LOCATION.shp")
Reading layer `PRESCHOOLS_LOCATION' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\PRESCHOOLS_LOCATION.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1925 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11203.01 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
Projected CRS: SVY21
supermarket_sf <- st_read("data/geospatial/SUPERMARKETS.shp") 
Reading layer `SUPERMARKETS' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial\SUPERMARKETS.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 526 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4901.188 ymin: 25529.08 xmax: 46948.22 ymax: 49233.6
Projected CRS: SVY21
st_crs(bus_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(childcare_sf)
Coordinate Reference System:
  User input: WGS_1984_Transverse_Mercator 
  wkt:
PROJCRS["WGS_1984_Transverse_Mercator",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(dengue_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(elder_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(gym_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(hawker_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(kindergartens_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(mrt_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(privateInst_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(parks_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(preschools_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(supermarket_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

As we can see, the following datasets have WGS84 as Geodetic CRS -

  • childcare_sf

  • gym_sf

  • hawker_sf

  • privateInst_sf

  • mrt_sf

The rest of the datasets have SVY21 as their Geodetic CRS, however, their EPSG code is 6326 which is wrong since the correct code for SYV21 for Singapore is 3414.

3.1.1 Assign the correct EPSG code to sf dataframes

childcare_sf <- childcare_sf %>%
  st_transform(crs = 3414)
gym_sf <- gym_sf %>%
  st_transform(crs = 3414)
hawker_sf <- hawker_sf %>%
  st_transform(crs = 3414)
privateInst_sf <- privateInst_sf %>%
  st_transform(crs = 3414)
mrt_sf <- mrt_sf %>%
  st_transform(crs = 3414)

bus_sf <- st_set_crs(bus_sf, 3414)
dengue_sf <- st_set_crs(dengue_sf, 3414)
elder_sf <- st_set_crs(elder_sf, 3414)
gym_sf <- st_set_crs(gym_sf, 3414)
kindergartens_sf <- st_set_crs(kindergartens_sf, 3414)
parks_sf <- st_set_crs(parks_sf, 3414)
preschools_sf <- st_set_crs(preschools_sf, 3414)
supermarket_sf <- st_set_crs(supermarket_sf, 3414)
st_crs(childcare_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(gym_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(hawker_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(privateInst_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(dengue_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(elder_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(gym_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(kindergartens_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mrt_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(parks_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(preschools_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(supermarket_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

All the datasets’ CRS have been successfully changed and all have EPSG 3414.

3.1.2 Check for Invalid Geometries

Now that we have assigned the correct EPSG, we will now check for any invalid geometries using the length() and st_is_valid() function so that there won’t be any failure later on.

length(which(st_is_valid(bus_sf) == FALSE))
[1] 0
length(which(st_is_valid(childcare_sf) == FALSE))
[1] 0
length(which(st_is_valid(dengue_sf) == FALSE))
[1] 0
length(which(st_is_valid(elder_sf) == FALSE))
[1] 0
length(which(st_is_valid(gym_sf) == FALSE))
[1] 0
length(which(st_is_valid(hawker_sf) == FALSE))
[1] 0
length(which(st_is_valid(kindergartens_sf) == FALSE))
[1] 0
length(which(st_is_valid(mrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(privateInst_sf) == FALSE))
[1] 0
length(which(st_is_valid(parks_sf) == FALSE))
[1] 0
length(which(st_is_valid(preschools_sf) == FALSE))
[1] 0
length(which(st_is_valid(supermarket_sf) == FALSE))
[1] 0

3.1.3 Calculating Proximity

We will begin with creating a Proximity function which will first create a matrix of distances between HDB and the locational factor using st_distance(). It will then use the min() function to find the minimum distance to get the nearest point of locational factor and then add it to HDB resale dataset using the mutate() function. It will then rename the column according to the input given so that the column names are unique and appropriate.

get_prox <- function(origin_df, dest_df, col_name){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(origin_df, dest_df)           
  
  # find the nearest location_factor and create new data frame
  near <- origin_df %>% 
    mutate(PROX = apply(dist_matrix, 1, function(x) min(x)) / 1000) 
  
  # rename column name according to input parameter
  names(near)[names(near) == 'PROX'] <- col_name

  # Return df
  return(near)
}

Now, lets call this function to calculate to get the proximity of resale HDB flats and these locational factors

rs_coords_sf <- get_prox(rs_coords_sf, bus_sf, "PROX_BusStops") 
rs_coords_sf <- get_prox(rs_coords_sf, childcare_sf, "PROX_ChildCare") 
rs_coords_sf <- get_prox(rs_coords_sf, dengue_sf, "PROX_Dengue") 
rs_coords_sf <- get_prox(rs_coords_sf, elder_sf, "PROX_ElderCare") 
rs_coords_sf <- get_prox(rs_coords_sf, gym_sf, "PROX_Gym") 
rs_coords_sf <- get_prox(rs_coords_sf, hawker_sf, "PROX_HawkerCentre") 
rs_coords_sf <- get_prox(rs_coords_sf, kindergartens_sf, "PROX_Kindergartens") 
rs_coords_sf <- get_prox(rs_coords_sf, mrt_sf, "PROX_MRT") 
rs_coords_sf <- get_prox(rs_coords_sf, privateInst_sf, "PROX_PrivateInstitutes") 
rs_coords_sf <- get_prox(rs_coords_sf, parks_sf, "PROX_Parks") 
rs_coords_sf <- get_prox(rs_coords_sf, preschools_sf, "PROX_PreSchools") 
rs_coords_sf <- get_prox(rs_coords_sf, supermarket_sf, "PROX_Supermarket") 

3.1.4 Calculating number of factors within Distance

We will be creating a function which will create a matrix of distances between the HDB and the locational factor using the st_distance() function. It will then use the sum() function to get the count of locational factors which are within a given threshold and this will be added to the HDB resale data using mutate() function. This column will be named according to the input given by the user to that is unique and appropriate.

get_within <- function(origin_df, dest_df, threshold_dist, col_name){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(origin_df, dest_df)   
  
  # count the number of location_factors within threshold_dist and create new data frame
  wdist <- origin_df %>% 
    mutate(WITHIN_DT = apply(dist_matrix, 1, function(x) sum(x <= threshold_dist)))
  
  # rename column name according to input parameter
  names(wdist)[names(wdist) == 'WITHIN_DT'] <- col_name

  # Return df
  return(wdist)
  
}

We need to find the count of locational factors within the given distance as per requirement for the following factors -

  • Kindergartens - 350m

  • Childcare centers - 350m

  • Bus stops - 350m

  • Primary School - 1km

  • Preschools - 1km (additional)

  • Private Institutes - 1km (additional)

Note - We are yet to pre-process the data for Primary Schools, so we will be finding the number of Primary Schools within 1km later.

rs_coords_sf <- get_within(rs_coords_sf, kindergartens_sf, 350, "Within_350M_Kindergarten")
rs_coords_sf <- get_within(rs_coords_sf, childcare_sf, 350, "Within_350M_ChildCare")
rs_coords_sf <- get_within(rs_coords_sf, bus_sf, 350, "Within_350M_BusStops")
rs_coords_sf <- get_within(rs_coords_sf, preschools_sf, 1000, "Within_1KM_PreSchools")
rs_coords_sf <- get_within(rs_coords_sf, privateInst_sf, 1000, "Within_1KM_PrivateInstitute")

3.2 Locational Factors without Geographic Coordinates

We will now begin with pre-processing data for which we don’t have geographic coordinates.

3.2.1 CBD Area

Upon doing some research, we can refer to ‘Downtown Core’ as the Central Business District (CBD) area. From the LatLong.net , we get the longitude (1.287953) and latitude (103.851784) of CBD.

So now that we have the longitude and latitude, all we need to do is convert it to EPSG 3414 (SVY21) format before we run the get_prox function.

name <- c('CBD Area')
latitude= c(1.287953)
longitude= c(103.851784)
cbd_coords <- data.frame(name, latitude, longitude)
cbd_coords_sf <- st_as_sf(cbd_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)
st_crs(cbd_coords_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Now that we have verified the CRS is in the correct format, we can run the get_proximity function to calculate the proximity of the HDBs to the CBD area.

rs_coords_sf <- get_prox(rs_coords_sf, cbd_coords_sf, "PROX_CBD") 

3.2.2 Shopping Malls

shopping_malls <- read_csv("data/geospatial/shopping_malls.csv")
glimpse(shopping_malls)
Rows: 184
Columns: 4
$ ...1      <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
$ latitude  <dbl> 1.274588, 1.305087, 1.301385, 1.312025, 1.334042, 1.437131, …
$ longitude <dbl> 103.8435, 103.9051, 103.8377, 103.7650, 103.8510, 103.7953, …
$ name      <chr> "100 AM", "112 KATONG", "313@SOMERSET", "321 CLEMENTI", "600…
shopping_sf <- st_as_sf(shopping_malls,
                              coords = c("longitude",
                                         "latitude"),
                              crs = 4326) %>%
  st_transform(crs = 3414)
st_crs(shopping_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Lets check for an invalid geometries so that we do not run into errors later when we calculate the proximity or plot the map.

length(which(!st_is_valid(shopping_sf)))
[1] 0

As we can see there are no invalid geometries. And since we have verified the CRS is in the correct format, we can run the get_proximity function to calculate the proximity of the HDBs to shopping malls.

rs_coords_sf <- get_prox(rs_coords_sf, shopping_sf, "PROX_ShoppingMalls") 

3.2.3 Good Primary Schools

First lets read the CSV file containing all the schools in Singapore.

pri_schl <- read_csv("data/geospatial/general-information-of-schools.csv")
glimpse(pri_schl)
Rows: 346
Columns: 31
$ school_name        <chr> "ADMIRALTY PRIMARY SCHOOL", "ADMIRALTY SECONDARY SC…
$ url_address        <chr> "https://admiraltypri.moe.edu.sg/", "http://www.adm…
$ address            <chr> "11   WOODLANDS CIRCLE", "31   WOODLANDS CRESCENT",…
$ postal_code        <chr> "738907", "737916", "768643", "768928", "579646", "…
$ telephone_no       <chr> "63620598", "63651733", "67592906", "67585384", "64…
$ telephone_no_2     <chr> "na", "63654596", "na", "na", "na", "na", "na", "na…
$ fax_no             <chr> "63627512", "63652774", "67592927", "67557778", "64…
$ fax_no_2           <chr> "na", "na", "na", "na", "na", "na", "na", "na", "na…
$ email_address      <chr> "ADMIRALTY_PS@MOE.EDU.SG", "Admiralty_SS@moe.edu.sg…
$ mrt_desc           <chr> "Admiralty Station", "ADMIRALTY MRT", "Yishun", "CA…
$ bus_desc           <chr> "TIBS 965, 964, 913", "904", "Yishun Ring Road - 81…
$ principal_name     <chr> "MR PEK WEE HAUR", "MR LAM YUI- P'NG", "MISS ONG LE…
$ first_vp_name      <chr> "MDM CHUA MUI LING", "MR NG SONG LIM STEVEN", "MADA…
$ second_vp_name     <chr> "MDM NUR SABARIAH BTE MOHD IBRAHIM", "MR SHEIK ALAU…
$ third_vp_name      <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NU…
$ fourth_vp_name     <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NU…
$ fifth_vp_name      <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NU…
$ sixth_vp_name      <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NU…
$ dgp_code           <chr> "WOODLANDS", "WOODLANDS", "YISHUN", "YISHUN", "BISH…
$ zone_code          <chr> "NORTH", "NORTH", "NORTH", "NORTH", "SOUTH", "SOUTH…
$ type_code          <chr> "GOVERNMENT SCHOOL", "GOVERNMENT SCHOOL", "GOVERNME…
$ nature_code        <chr> "CO-ED SCHOOL", "CO-ED SCHOOL", "CO-ED SCHOOL", "CO…
$ session_code       <chr> "FULL DAY", "SINGLE SESSION", "SINGLE SESSION", "SI…
$ mainlevel_code     <chr> "PRIMARY", "SECONDARY", "PRIMARY", "SECONDARY", "PR…
$ sap_ind            <chr> "No", "No", "No", "No", "Yes", "No", "No", "No", "N…
$ autonomous_ind     <chr> "No", "No", "No", "No", "No", "No", "No", "No", "Ye…
$ gifted_ind         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No…
$ ip_ind             <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No…
$ mothertongue1_code <chr> "Chinese", "Chinese", "Chinese", "Chinese", "Chines…
$ mothertongue2_code <chr> "Malay", "Malay", "Malay", "Malay", "na", "Malay", …
$ mothertongue3_code <chr> "Tamil", "Tamil", "Tamil", "Tamil", "na", "Tamil", …

We can see that we have “mainlevel_code” which categorizes as “primary, secondary, mixed levels, junior college”. So lets filter out and extract Primary School as per requirement.

pri_schl <- pri_schl %>%
  filter(mainlevel_code == "PRIMARY") %>%
  select(school_name, address, postal_code, mainlevel_code)
glimpse(pri_schl)
Rows: 183
Columns: 4
$ school_name    <chr> "ADMIRALTY PRIMARY SCHOOL", "AHMAD IBRAHIM PRIMARY SCHO…
$ address        <chr> "11   WOODLANDS CIRCLE", "10   YISHUN STREET 11", "100 …
$ postal_code    <chr> "738907", "768643", "579646", "159016", "544969", "5697…
$ mainlevel_code <chr> "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", …

We can see that there are 183 Primary Schools in Singapore. Lets create a list storing the postal codes and then retrieve the coordinates of these postal codes.

# List to store the postal codes
prisch_list <- sort(unique(pri_schl$postal_code))
# Calling the get_coords() function to retrieve the coordinates of the primary schools
prisch_coords <- get_coords(prisch_list)

Now lets ensure that there are no NA values

prisch_coords[(is.na(prisch_coords$postal) | is.na(prisch_coords$latitude) | is.na(prisch_coords$longitude)), ]
[1] address   postal    latitude  longitude
<0 rows> (or 0-length row.names)

As we can see there are no values with NA values, so we can proceed to combine the coordinates with their respective primary school names

prisch_coords = prisch_coords[c("postal","latitude", "longitude")]
pri_schl <- left_join(pri_schl, prisch_coords, by = c('postal_code' = 'postal'))

Lets take a look at the dataframe now

pri_schl
# A tibble: 183 × 6
   school_name                    address        posta…¹ mainl…² latit…³ longi…⁴
   <chr>                          <chr>          <chr>   <chr>   <chr>   <chr>  
 1 ADMIRALTY PRIMARY SCHOOL       11   WOODLAND… 738907  PRIMARY 1.4426… 103.80…
 2 AHMAD IBRAHIM PRIMARY SCHOOL   10   YISHUN S… 768643  PRIMARY 1.4331… 103.83…
 3 AI TONG SCHOOL                 100  Bright H… 579646  PRIMARY 1.3605… 103.83…
 4 ALEXANDRA PRIMARY SCHOOL       2A   Prince C… 159016  PRIMARY 1.2913… 103.82…
 5 ANCHOR GREEN PRIMARY SCHOOL    31   Anchorva… 544969  PRIMARY 1.3903… 103.88…
 6 ANDERSON PRIMARY SCHOOL        19   ANG MO K… 569785  PRIMARY 1.3841… 103.84…
 7 ANG MO KIO PRIMARY SCHOOL      20   ANG MO K… 569920  PRIMARY 1.3687… 103.83…
 8 ANGLO-CHINESE SCHOOL (JUNIOR)  16   WINSTEDT… 227988  PRIMARY 1.3093… 103.84…
 9 ANGLO-CHINESE SCHOOL (PRIMARY) 50   BARKER R… 309918  PRIMARY 1.3187… 103.83…
10 ANGSANA PRIMARY SCHOOL         3    Tampines… 529366  PRIMARY 1.3484… 103.95…
# … with 173 more rows, and abbreviated variable names ¹​postal_code,
#   ²​mainlevel_code, ³​latitude, ⁴​longitude

Now that we have combines the dataframes, lets convert it to a sf object and assign and transform its CRS

prisch_sf <- st_as_sf(pri_schl,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)
st_crs(prisch_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Now lets find the number of schools within 1KM of HDB resales using the get_within() function

rs_coords_sf <- get_within(rs_coords_sf, prisch_sf, 1000, "Within_1KM_PriSchl")

3.2.4 Good Primary Schools (Top 10)

Based on Salary.sg, the below list are the top 10 primary schools in Singapore

url <- "https://www.salary.sg/2021/best-primary-schools-2021-by-popularity/"

good_pri <- data.frame()

schools <- read_html(url) %>%
  html_nodes(xpath = paste('//*[@id="post-3068"]/div[3]/div/div/ol/li') ) %>%
  html_text() 

for (i in (schools)){
  sch_name <- toupper(gsub(" – .*","",i))
  sch_name <- gsub("\\(PRIMARY SECTION)","",sch_name)
  sch_name <- trimws(sch_name)
  new_row <- data.frame(pri_sch_name=sch_name)
  # Add the row
  good_pri <- rbind(good_pri, new_row)
}

top_good_pri <- head(good_pri, 10)

Now that we have got the top 10 primary school, lets check that the names in top_good_pri are the same as that in pri_schl.

top_good_pri$pri_sch_name[!top_good_pri$pri_sch_name %in% prisch_sf$school_name]
[1] "CHIJ ST. NICHOLAS GIRLS’ SCHOOL" "CATHOLIC HIGH SCHOOL"           
[3] "ST. HILDA’S PRIMARY SCHOOL"     

As we see, the below listed schools are not the same.

  • Chij St. Nicholas Girl’s School

  • Catholic High School

  • St. Hilda’s Primary School

This is because the first 2 school are ‘Mixed Levels’, as a result we need to use get_coords() to get their coordinates. However, upon closely investigating as to why St. Hilda’s Primary School is not shown in prisch_sf despite it being a primary school, I realized that it is because ” ’ ” is different in both and hence, this needs to be changed.

top_good_pri$pri_sch_name[top_good_pri$pri_sch_name == "ST. HILDA’S PRIMARY SCHOOL"] <- "ST. HILDA'S PRIMARY SCHOOL"
top_good_pri$pri_sch_name[!top_good_pri$pri_sch_name %in% prisch_sf$school_name]
[1] "CHIJ ST. NICHOLAS GIRLS’ SCHOOL" "CATHOLIC HIGH SCHOOL"           

As we can see we have rectified that. Now lets get the use get_coords() to get the coordinates.

goodprisch_coords <- get_coords(unique(top_good_pri$pri_sch_name))

Lets check if any of the values have NA.

goodprisch_coords[(is.na(goodprisch_coords$postal) | is.na(goodprisch_coords$latitude) | is.na(goodprisch_coords$longitude)), ]
[1] address   postal    latitude  longitude
<0 rows> (or 0-length row.names)

None of the values have NA and hence, we can proceed to convert it to sf dataframe and assign and transform it to the correct CRS.

goodprischl_sf <- st_as_sf(goodprisch_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)
st_crs(goodprischl_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Now that we have verified the CRS, lets calculate the proximity of HDB and Good Primary schools using the get_proximity() function.

rs_coords_sf <- get_prox(rs_coords_sf, goodprischl_sf, "PROX_GoodPriSchls")

3.2.5 Write to RDS

Now that we our resale subset data is complete with all the locational factors, we can now save it into an rds file.

rs_factors_rds <- write_rds(rs_coords_sf, "data/rds/rs_factors.rds")

4.0 Geospatial Data

Lets import the Master Plan 2019 Subzone data

mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

We can see that the Geodetic CRS is WGS 84, hence we need to change it.

mpsz_sf <- st_transform(mpsz_sf, 3414)
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Now, that we have verified the CRS, lets check for invalid variables

length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 6

We can see that that there are 6 invalid geometries. Lets rectify that!

mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0

5.0 Resale with Locational Factors

Lets look into the rds file we created.

rs_sf <- read_rds("data/rds/rs_factors.rds")
glimpse(rs_sf)
Rows: 23,656
Columns: 35
$ month                       <chr> "2021-01", "2021-01", "2021-01", "2021-01"…
$ town                        <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", …
$ address                     <chr> "547 ANG MO KIO AVE 10", "414 ANG MO KIO A…
$ block                       <chr> "547", "414", "509", "467", "571", "134", …
$ street_name                 <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", …
$ flat_type                   <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4…
$ storey_range                <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO…
$ floor_area_sqm              <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92…
$ flat_model                  <chr> "New Generation", "New Generation", "New G…
$ lease_commence_date         <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, …
$ remaining_lease_mths        <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 69…
$ resale_price                <dbl> 370000, 375000, 380000, 385000, 410000, 41…
$ postal                      <chr> "560547", "560414", "560509", "560467", "5…
$ geometry                    <POINT [m]> POINT (30770.07 39578.64), POINT (30…
$ PROX_BusStops               <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887…
$ PROX_ChildCare              <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, …
$ PROX_Dengue                 <dbl> 1.65589777, 1.49632531, 0.90309545, 1.7025…
$ PROX_ElderCare              <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981…
$ PROX_Gym                    <dbl> 1.2218265, 0.7301978, 0.5600145, 0.6700282…
$ PROX_HawkerCentre           <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679…
$ PROX_Kindergartens          <dbl> 0.24936617, 0.40933118, 0.31906250, 0.0554…
$ PROX_MRT                    <dbl> 1.0486763, 0.7574007, 0.4567509, 0.8868554…
$ PROX_PrivateInstitutes      <dbl> 0.7530576, 0.5080930, 0.4481675, 0.7259754…
$ PROX_Parks                  <dbl> 50.06755, 48.87507, 49.50413, 49.20320, 49…
$ PROX_PreSchools             <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, …
$ PROX_Supermarket            <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715…
$ Within_350M_Kindergarten    <int> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, …
$ Within_350M_ChildCare       <int> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, …
$ Within_350M_BusStops        <int> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8,…
$ Within_1KM_PreSchools       <int> 16, 23, 28, 19, 28, 32, 29, 32, 21, 25, 17…
$ Within_1KM_PrivateInstitute <int> 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 2, 1, …
$ PROX_CBD                    <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.…
$ PROX_ShoppingMalls          <dbl> 1.2162873, 0.8444977, 0.5600960, 0.9304149…
$ Within_1KM_PriSchl          <int> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, …
$ PROX_GoodPriSchls           <dbl> 1.8744120, 1.4292244, 1.7689255, 1.7759141…
rs_sf <- read_rds("data/rds/rs_factors.rds")
glimpse(rs_sf)
Rows: 23,656
Columns: 35
$ month                       <chr> "2021-01", "2021-01", "2021-01", "2021-01"…
$ town                        <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", …
$ address                     <chr> "547 ANG MO KIO AVE 10", "414 ANG MO KIO A…
$ block                       <chr> "547", "414", "509", "467", "571", "134", …
$ street_name                 <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", …
$ flat_type                   <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4…
$ storey_range                <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO…
$ floor_area_sqm              <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92…
$ flat_model                  <chr> "New Generation", "New Generation", "New G…
$ lease_commence_date         <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, …
$ remaining_lease_mths        <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 69…
$ resale_price                <dbl> 370000, 375000, 380000, 385000, 410000, 41…
$ postal                      <chr> "560547", "560414", "560509", "560467", "5…
$ geometry                    <POINT [m]> POINT (30770.07 39578.64), POINT (30…
$ PROX_BusStops               <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887…
$ PROX_ChildCare              <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, …
$ PROX_Dengue                 <dbl> 1.65589777, 1.49632531, 0.90309545, 1.7025…
$ PROX_ElderCare              <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981…
$ PROX_Gym                    <dbl> 1.2218265, 0.7301978, 0.5600145, 0.6700282…
$ PROX_HawkerCentre           <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679…
$ PROX_Kindergartens          <dbl> 0.24936617, 0.40933118, 0.31906250, 0.0554…
$ PROX_MRT                    <dbl> 1.0486763, 0.7574007, 0.4567509, 0.8868554…
$ PROX_PrivateInstitutes      <dbl> 0.7530576, 0.5080930, 0.4481675, 0.7259754…
$ PROX_Parks                  <dbl> 50.06755, 48.87507, 49.50413, 49.20320, 49…
$ PROX_PreSchools             <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, …
$ PROX_Supermarket            <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715…
$ Within_350M_Kindergarten    <int> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, …
$ Within_350M_ChildCare       <int> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, …
$ Within_350M_BusStops        <int> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8,…
$ Within_1KM_PreSchools       <int> 16, 23, 28, 19, 28, 32, 29, 32, 21, 25, 17…
$ Within_1KM_PrivateInstitute <int> 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 2, 1, …
$ PROX_CBD                    <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.…
$ PROX_ShoppingMalls          <dbl> 1.2162873, 0.8444977, 0.5600960, 0.9304149…
$ Within_1KM_PriSchl          <int> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, …
$ PROX_GoodPriSchls           <dbl> 1.8744120, 1.4292244, 1.7689255, 1.7759141…

As we take a deeper look into our sf, we can see that the column name storey_range has data in characters with each value being in a range. Hence, this data can be a categorical variable! (Categorical variables represent types of data which may be divided into groups).

When categorical variables are used in regression analysis, they need to be carefully used as regression models require numerical input variables to make predictions. Hence categorical variables (storey_range in our case) can’t be use directly. It needs to be transformed into numerical variables using encoding techniques like dummy coding or one-hot code encoding.

However, in our case our variable can be ordered from low to high as the storey_range have a meaning. Flats at a higher storey_range will be more pricier compared to that of a lower one. This will affect the price of the HDS resale price. Hence, instead of using dummy coding method, we will be using sorting the storey_range categorical variable and assigning numerical values in ascending order.

5.1 Extract the sorted unique storay_range

storeys <- sort(unique(rs_sf$storey_range))

5.2 Create a dataframe to store order

storey_order <- 1:length(storeys)
storey_range_order <- data.frame(storeys, storey_order)

Now lets take a look into the dataframe created

head(storey_range_order)
   storeys storey_order
1 01 TO 03            1
2 04 TO 06            2
3 07 TO 09            3
4 10 TO 12            4
5 13 TO 15            5
6 16 TO 18            6

As we can see the storeys are correctly assigned to the storey_order in an ascending manner.

5.3 Combine the Storeys order to Resale

rs_sf <- left_join(rs_sf, storey_range_order, by= c("storey_range" = "storeys"))

5.4 Select Required Columns for Analysis

Now lets drop the unrequired columns and only select the required columns necessary for anlaysis.

rs_req <- rs_sf %>%
  select(resale_price, floor_area_sqm, storey_order, remaining_lease_mths,
         PROX_BusStops, PROX_ChildCare, PROX_Dengue, PROX_ElderCare, PROX_Gym,
         PROX_HawkerCentre, PROX_Kindergartens, PROX_PrivateInstitutes, PROX_Parks, PROX_PreSchools, PROX_Supermarket, Within_350M_Kindergarten, Within_350M_ChildCare, Within_350M_BusStops, Within_1KM_PreSchools, Within_1KM_PrivateInstitute, PROX_CBD, PROX_ShoppingMalls, Within_1KM_PriSchl, PROX_GoodPriSchls, PROX_MRT)
summary(rs_req)
  resale_price     floor_area_sqm    storey_order    remaining_lease_mths
 Min.   : 250000   Min.   : 70.00   Min.   : 1.000   Min.   : 534.0      
 1st Qu.: 440000   1st Qu.: 91.00   1st Qu.: 2.000   1st Qu.: 786.0      
 Median : 490000   Median : 93.00   Median : 3.000   Median : 949.0      
 Mean   : 526124   Mean   : 94.67   Mean   : 3.475   Mean   : 945.5      
 3rd Qu.: 568000   3rd Qu.:100.00   3rd Qu.: 4.000   3rd Qu.:1115.0      
 Max.   :1370000   Max.   :145.00   Max.   :17.000   Max.   :1168.0      
 PROX_BusStops     PROX_ChildCare    PROX_Dengue    PROX_ElderCare  
 Min.   :0.01585   Min.   :0.0000   Min.   :0.000   Min.   :0.0000  
 1st Qu.:0.07295   1st Qu.:0.0692   1st Qu.:1.038   1st Qu.:0.3186  
 Median :0.10648   Median :0.1135   Median :2.323   Median :0.6004  
 Mean   :0.11367   Mean   :0.1230   Mean   :2.922   Mean   :0.7803  
 3rd Qu.:0.14474   3rd Qu.:0.1673   3rd Qu.:4.388   3rd Qu.:1.0839  
 Max.   :0.39147   Max.   :0.5865   Max.   :8.774   Max.   :3.3016  
    PROX_Gym       PROX_HawkerCentre PROX_Kindergartens  PROX_PrivateInstitutes
 Min.   :0.03343   Min.   :0.0306    Min.   :0.0000001   Min.   :0.000001      
 1st Qu.:0.70195   1st Qu.:0.3965    1st Qu.:0.1702019   1st Qu.:0.566519      
 Median :1.12108   Median :0.6845    Median :0.2723672   Median :0.939001      
 Mean   :1.31143   Mean   :0.7963    Mean   :0.2938084   Mean   :1.105949      
 3rd Qu.:1.68694   3rd Qu.:1.0251    3rd Qu.:0.3908445   3rd Qu.:1.588907      
 Max.   :4.57273   Max.   :2.8676    Max.   :1.1478605   Max.   :3.192992      
   PROX_Parks    PROX_PreSchools   PROX_Supermarket Within_350M_Kindergarten
 Min.   :37.27   Min.   :0.00000   Min.   :0.0000   Min.   :0.000           
 1st Qu.:45.22   1st Qu.:0.06435   1st Qu.:0.1621   1st Qu.:0.000           
 Median :51.95   Median :0.10568   Median :0.2516   Median :1.000           
 Mean   :49.46   Mean   :0.11336   Mean   :0.2745   Mean   :1.005           
 3rd Qu.:54.24   3rd Qu.:0.15592   3rd Qu.:0.3624   3rd Qu.:1.000           
 Max.   :57.88   Max.   :0.54739   Max.   :1.5713   Max.   :7.000           
 Within_350M_ChildCare Within_350M_BusStops Within_1KM_PreSchools
 Min.   : 0.000        Min.   : 0.000       Min.   : 5.00        
 1st Qu.: 3.000        1st Qu.: 6.000       1st Qu.:21.00        
 Median : 4.000        Median : 8.000       Median :27.00        
 Mean   : 3.896        Mean   : 7.958       Mean   :28.78        
 3rd Qu.: 5.000        3rd Qu.:10.000       3rd Qu.:34.00        
 Max.   :20.000        Max.   :19.000       Max.   :70.00        
 Within_1KM_PrivateInstitute    PROX_CBD       PROX_ShoppingMalls
 Min.   : 0.000              Min.   : 0.9994   Min.   :0.0000    
 1st Qu.: 0.000              1st Qu.: 9.7018   1st Qu.:0.3736    
 Median : 1.000              Median :13.1064   Median :0.5756    
 Mean   : 1.558              Mean   :12.1306   Mean   :0.6512    
 3rd Qu.: 1.000              3rd Qu.:14.8850   3rd Qu.:0.8535    
 Max.   :56.000              Max.   :19.6501   Max.   :2.3216    
 Within_1KM_PriSchl PROX_GoodPriSchls     PROX_MRT                geometry    
 Min.   :0.000      Min.   : 0.06525   Min.   :0.02179   POINT        :23656  
 1st Qu.:2.000      1st Qu.: 2.29192   1st Qu.:0.27155   epsg:3414    :    0  
 Median :3.000      Median : 3.63387   Median :0.48357   +proj=tmer...:    0  
 Mean   :3.209      Mean   : 3.99284   Mean   :0.56812                        
 3rd Qu.:4.000      3rd Qu.: 5.41144   3rd Qu.:0.77638                        
 Max.   :9.000      Max.   :10.62237   Max.   :2.12909